Create a cli command which wraps Claude Agend SDK by deti · Pull Request #1 · deti/silly-scripts

deti · 2026-03-10T05:38:42Z

Create a cli command which wraps Claude Agend SDK. it should receive prompt as parameter and run it with claude. howing all output of claude

… Here's a summary of the key architectural decisions: --- ### Design Overview: `ask-claude` CLI **Runtime:** Python 3.14+ — matches the existing project. TypeScript rejected (would add a second runtime to a Python-only codebase). **SDK Entry Point:** `claude_agent_sdk.query()` async iterator — streams messages incrementally, manages the full agentic loop (tool execution, retries, context) internally. `ClaudeSDKClient` deferred to a future version when session resume is needed. **CLI Interface (Click):** - Prompt via positional arg, `--prompt` flag, or stdin pipe - Flags for `--model`, `--tools`, `--system-prompt`, `--permission-mode`, `--working-dir`, `--verbose`, `--json` - One new file: `src/silly_scripts/cli/ask_claude.py` — follows the project's established one-module-per-command pattern **Streaming Strategy:** - **Default:** Assistant text → stdout (real-time), tool-call summaries → stderr - **`--json`:** Full NDJSON to stdout for machine consumption - stdout/stderr separation is a deliberate public contract enabling `answer=$(ask-claude "...")` **Error Handling:** Six distinct exit codes (0/1/2/3/4/5/99/130) following POSIX conventions. No stack traces by default; `--verbose` enables them. API key validated before SDK call, never accepted as a flag (security). **Configuration:** Defaults sourced from pydantic-settings (`Settings` class) with precedence: CLI flag > env var > hardcoded default. **Key Invariant:** `ask_claude.py` is the only module that imports `claude_agent_sdk` — no SDK leakage into other parts of the codebase. Sources: - [Claude Agent SDK Overview](https://platform.claude.com/docs/en/agent-sdk/overview) - [Claude Agent SDK Quickstart](https://platform.claude.com/docs/en/agent-sdk/quickstart)

## Summary ### Files modified: 1. **`pyproject.toml`** — Added `claude-agent-sdk` dependency and `ask-claude` entry point 2. **`src/silly_scripts/settings.py`** — Added 3 new settings fields: `anthropic_api_key`, `claude_default_model`, `claude_default_tools` 3. **`env.template`** — Added Claude-related env var documentation ### Files created: 4. **`src/silly_scripts/cli/ask_claude.py`** — Full Click command scaffold with all flags from the ADR (`--model`, `--tools`, `--system-prompt`, `--permission-mode`, `--working-dir`, `--verbose`, `--json`), prompt resolution from positional arg / `--prompt` flag / stdin pipe, and helper functions (`_resolve_prompt`, `_parse_tools`) 5. **`tests/cli/test_ask_claude.py`** — 17 tests covering prompt resolution (6 tests), tool parsing (4 tests), and CLI integration (7 tests) 6. **`tests/test_settings.py`** — 2 new tests for the Claude settings defaults and env overrides ### Acceptance criteria verified: - ✅ Project skeleton exists with the defined structure (`ask_claude.py` under `src/silly_scripts/cli/`) - ✅ `claude-agent-sdk` is listed as a dependency and installed (v0.1.48) - ✅ `ask-claude --help` prints usage without crashing (exit 0) - ✅ `ask-claude` with no args prints a user-friendly error (exit 2) - ✅ All 101 new + existing tests pass (2 pre-existing failures in `test_speech_to_text.py` are unrelated) - ✅ Lint and format checks pass clean

## Summary ### Changes to `src/silly_scripts/cli/ask_claude.py` Replaced the "Not yet implemented" stub with full Claude Agent SDK integration: 1. **API key validation** — Checks `settings.anthropic_api_key` before attempting a query; produces a clear `ClickException` with instructions if missing (non-zero exit code). 2. **SDK `query()` integration** — Uses the `claude_agent_sdk.query()` async function to send the prompt and stream responses. The async coroutine is run via `asyncio.run()`. 3. **Real-time streaming output** — `_run_query()` iterates over messages from the SDK: - `AssistantMessage` with `TextBlock` → printed to stdout immediately - `ToolUseBlock` → printed to stderr only when `--verbose` is set - `SystemMessage` → printed to stderr only when `--verbose` is set - `ResultMessage` → logs completion stats; raises error if `is_error=True` 4. **JSON/NDJSON mode** — `--json` flag serializes each message as a single JSON line to stdout via `dataclasses.asdict()`. 5. **Error handling** — `CLINotFoundError`, `ProcessError`, and `ClaudeSDKError` are caught and converted to `click.ClickException` with meaningful messages and non-zero exit codes. 6. **Options forwarding** — All CLI flags (`--model`, `--tools`, `--system-prompt`, `--permission-mode`, `--working-dir`) are correctly forwarded to `ClaudeAgentOptions`. ### Changes to `tests/cli/test_ask_claude.py` Expanded from 17 tests to **27 tests** covering: - All original prompt resolution and tool parsing tests (preserved) - **New `TestPrintAssistant`** (3 tests): text block output, tool-use hiding, verbose tool-use display - **New `TestPrintJson`** (2 tests): dataclass serialization, non-dataclass fallback - **New `TestRunQuery`** (3 tests): streaming text output, error result handling, NDJSON mode - **Updated `TestCli`** (9 tests): help flag, no-args error, missing API key error, successful query (exit 0), SDK error handling, CLI-not-found handling, invalid permission mode, stdin piping, model flag forwarding All SDK calls are properly mocked — no real network calls in tests. ### Acceptance Criteria Verification - ✅ CLI accepts a prompt as a positional argument - ✅ All Claude output is displayed in real time as it is generated (streamed via async iterator) - ✅ Missing API key produces a clear error message and non-zero exit code - ✅ Empty or missing prompt produces a usage error and non-zero exit code - ✅ Successful execution exits with code 0 - ✅ All 111 tests pass (2 pre-existing failures in `test_speech_to_text.py` are unrelated) - ✅ Lint and format checks pass clean

…t me compile my findings. --- # Code Review: `ask-claude` CLI Implementation (task-003) **Reviewer:** 🔍 code-reviewer **Verdict:** ✅ **APPROVE** — with non-blocking suggestions The implementation is well-crafted, follows project conventions closely, and faithfully implements the ADR-001 design. No blocking issues found. Below are categorized findings. --- ## ✅ Security: No Hardcoded Secrets - **No API keys in source code.** Confirmed via grep — no hardcoded keys or secrets anywhere in `ask_claude.py`. - **API key sourced correctly** from `settings.anthropic_api_key` (pydantic-settings → env var / `.env` file). No `--api-key` CLI flag, matching the design's explicit security decision. - **Test file uses `"sk-test-key"`** in `patch.dict("os.environ", ...)` — this is appropriate for test mocking and never touches a real API. **Pass.** No issues. --- ## ✅ Design Adherence (ADR-001) | Design Requirement | Implementation | Status | |---|---|---| | Click command with `main()` entry point | ✅ `@click.command() def main(...)` | ✔ | | `pyproject.toml` entry point | ✅ `ask-claude = "silly_scripts.cli.ask_claude:main"` | ✔ | | Prompt resolution: positional → flag → stdin | ✅ `_resolve_prompt()` | ✔ | | `query()` async iterator (not `ClaudeSDKClient`) | ✅ `async for message in query(...)` | ✔ | | API key validated before SDK call | ✅ Lines 165-171 | ✔ | | Text → stdout, tool calls → stderr | ✅ `click.echo()` / `click.echo(..., err=True)` | ✔ | | `--json` NDJSON mode | ✅ `_print_json()` | ✔ | | Settings via pydantic-settings | ✅ `get_settings()` for model/tools defaults | ✔ | | Module structure matches design | ✅ Single module, correct function decomposition | ✔ | **Pass.** Implementation matches the design faithfully. --- ## ⚠️ Non-Blocking Findings ### 1. **[suggestion] Incomplete error categorization vs. design spec** — Error Handling The ADR specifies **7 distinct error categories** with specific exit codes (2–5, 99, 130). The implementation catches only 3 SDK exception types (`CLINotFoundError`, `ProcessError`, `ClaudeSDKError`) and maps all of them to `click.ClickException` (exit code 1). **Missing from the implementation:** - **SDK authentication failure** (exit code 3 per design) - **Rate limit / quota** (exit code 4 per design) - **Network / transient error** (exit code 5 per design) - **Unexpected exception** catch-all with exit code 99 - **KeyboardInterrupt** handler (exit code 130) **Why it matters:** The design specifies these as distinct categories for scripting consumers who may key on exit codes. Currently, all errors exit with code 1. **Recommendation:** This is acceptable for v1 if the SDK doesn't expose distinct exception types for auth/rate-limit/network errors yet. If it does, these should be added. At minimum, a `KeyboardInterrupt` handler producing a clean exit (`\nInterrupted.`) would be a nice polish. Consider adding a comment noting the gap: `# TODO: Differentiate auth/rate-limit/network errors when SDK exposes specific exception types`. **Severity:** Non-blocking (the current behavior is correct — errors are caught and reported — just not as granular as the design specified). --- ### 2. **[suggestion] Default mode tool-call output differs from design** — Streaming/Output The ADR Section 5 states: > **Default mode:** Tool invocations are printed as single summary lines to **stderr** (so they can be suppressed or redirected independently). But the implementation **hides tool calls entirely** in non-verbose mode (line 245: `elif isinstance(block, ToolUseBlock) and verbose:`). Tool calls are only shown with `--verbose`. **Why it matters:** The design intended a middle ground — short tool summaries always on stderr (e.g., `[tool] Read: src/auth.py`), with full inputs only in verbose mode. The implementation made a reasonable UX call (less noise), but it deviates from the design's explicit output contract. **Recommendation:** This is arguably a better UX decision, but the ADR should be updated to match the implementation, or the implementation should show short summaries in default mode. A quick `# NOTE: Deviates from ADR-001 — tool calls hidden by default for cleaner output` comment would suffice. **Severity:** Non-blocking. --- ### 3. **[suggestion] `model` not passed to `ClaudeAgentOptions` constructor** — Correctness (Minor) Line 173: `options = ClaudeAgentOptions(model=resolved_model, ...)` — this looks correct. However, `system_prompt` is set as a post-construction attribute assignment (line 180: `options.system_prompt = system_prompt`) rather than in the constructor. This is inconsistent — if the `ClaudeAgentOptions` dataclass/model validates fields at construction time, post-assignment might bypass validation. **Recommendation:** Pass `system_prompt` in the constructor if the SDK supports it, for consistency: ```python options = ClaudeAgentOptions( model=resolved_model, allowed_tools=resolved_tools, permission_mode=permission_mode, cwd=str(working_dir) if working_dir else None, system_prompt=system_prompt, # None is acceptable ) ``` **Severity:** Non-blocking. Current code works; this is a cleanliness suggestion. --- ### 4. **[suggestion] f-string in `logger.debug()` calls** — Performance Lines 157-163 use f-string interpolation in `logger.debug()` calls. Per Python logging best practices, lazy formatting with `%s` is preferred because f-strings are evaluated even when the log level is above DEBUG: ```python # Current logger.debug(f"Prompt: {resolved_prompt}") # Recommended logger.debug("Prompt: %s", resolved_prompt) ``` **Severity:** Non-blocking. The project's CLAUDE.md explicitly states "Uses f-string log messages" as a convention, so this is consistent with project standards. Noting it for awareness only — **do not change**. --- ## ✅ Readability & Conventions - **Module structure** follows the documented order: docstring → imports → logger → helpers → `@click.command()` main → `__main__` guard ✔ - **Import organization**: stdlib → third-party → local ✔ - **Type hints** on all function signatures and return types ✔ - **Google-style docstrings** with `Args:`, `Returns:`, `Raises:` ✔ - **`click.echo()` used** (not `print()`) — respects `T20` lint rule ✔ - **Naming** is clear and descriptive throughout ✔ - **`# pragma: no cover`** on `__main__` guard ✔ **Pass.** Clean, idiomatic code. --- ## ✅ Test Adequacy **27 tests** with strong coverage: - ✅ Prompt resolution (6 tests covering positional, flag, stdin, empty, precedence) - ✅ Tool parsing (4 tests covering comma-sep, whitespace, empty entries, single) - ✅ Message printing (3 tests: text blocks, hidden tool-use, verbose tool-use) - ✅ JSON serialization (2 tests: dataclass, fallback) - ✅ Async streaming (3 tests: text output, error result, NDJSON mode) - ✅ CLI integration (9 tests: help, no-args, missing key, success, SDK error, CLI-not-found, invalid permission, stdin pipe, model forwarding) - ✅ All SDK calls properly mocked — no real network calls - ✅ `autouse` fixture for `get_settings.cache_clear()` — follows project conventions - ✅ Test helper factories (`_make_success_messages`, `_make_result_only`) reduce duplication **One minor observation:** The `ProcessError` exception path is not tested (only `ClaudeSDKError` and `CLINotFoundError` have CLI integration tests). Consider adding a test for `ProcessError` handling. **Pass.** Test coverage is solid. --- ## Summary | Category | Verdict | |---|---| | Security (no hardcoded secrets) | ✅ Pass | | Design adherence | ✅ Pass (2 minor deviations noted) | | Error handling | ✅ Pass (functional, less granular than design spec) | | Readability & conventions | ✅ Pass | | Test coverage | ✅ Pass | **Overall: APPROVE.** The implementation is clean, secure, well-tested, and faithfully follows the architectural design. The non-blocking suggestions above are improvements for consideration, not merge blockers.

…hanges made: ## Changes Applied ### `src/silly_scripts/cli/ask_claude.py` 1. **KeyboardInterrupt handler** (Finding #1): Added a `KeyboardInterrupt` catch that prints `"\nInterrupted."` to stderr and exits with code 130 (POSIX convention for SIGINT). 2. **TODO comment for granular exit codes** (Finding #1): Added a TODO on the `ClaudeSDKError` catch noting that auth/rate-limit/network errors should be differentiated when the SDK exposes specific exception types. 3. **ADR deviation note** (Finding #2): Added a `# NOTE` comment documenting the intentional deviation from ADR-001 where tool calls are hidden by default for cleaner output. 4. **Constructor consistency** (Finding #3): Moved `system_prompt` into the `ClaudeAgentOptions` constructor call instead of post-construction assignment, for consistency and proper validation. 5. **Finding #4 (f-string logging)**: Not changed — reviewer explicitly said "do not change" as it matches the project's `CLAUDE.md` convention. ### `tests/cli/test_ask_claude.py` 6. **ProcessError test**: Added `test_process_error_exits_nonzero` to cover the previously untested `ProcessError` exception path. 7. **KeyboardInterrupt test**: Added `test_keyboard_interrupt_exits_130` to verify the new handler produces exit code 130. **All 29 tests pass.** ✅

All **69 tests** in `tests/cli/test_ask_claude.py` pass (up from 29 previously). The 2 failures in the full suite are pre-existing `test_speech_to_text.py` issues unrelated to our changes. ### What was added (40 new tests) **Unit tests — Prompt resolution edge cases** (`TestResolvePromptEdgeCases`, 6 tests): - Empty string positional falls through to flag - Empty string for both positional and flag raises `UsageError` - Whitespace-only positional is accepted (truthy) - Very long prompt (100k chars) accepted without truncation - Newline-containing prompts preserved - Unicode prompts preserved **Unit tests — Tool parsing edge cases** (`TestParseToolsEdgeCases`, 3 tests): - Empty string → empty list - Commas-only → empty list - Whitespace-only entries → empty list **Unit tests — Print assistant edge cases** (`TestPrintAssistantEdgeCases`, 4 tests): - Multiple text blocks printed in order - Mixed text + tool blocks (tool hidden when not verbose) - Empty content list → no output - Empty text block → newline only **Unit tests — Print JSON edge cases** (`TestPrintJsonEdgeCases`, 2 tests): - `ResultMessage` with `total_cost_usd` serializes correctly - `AssistantMessage` with `ToolUseBlock` serializes correctly **Async unit tests — _run_query edge cases** (`TestRunQueryEdgeCases`, 6 tests): - Multiple assistant messages stream incrementally - `SystemMessage` hidden when not verbose - `SystemMessage` shown on stderr when verbose - Error result includes session ID in exception message - JSON mode serializes system messages as NDJSON - Very long prompt forwarded to SDK without truncation **Integration tests — Options forwarding** (`TestCliOptionsForwarding`, 10 tests): - `--tools` flag parsed and forwarded to `allowed_tools` - `--system-prompt` forwarded - `--permission-mode` forwarded - `--working-dir` forwarded to `cwd` - `-p` short form accepted - `-m` short form accepted - `-t` short form accepted - Default model from settings - Default tools from settings - Prompt text forwarded verbatim **Integration tests — Output behavior** (`TestCliOutputBehavior`, 5 tests): - Streamed text from multiple messages appears in stdout - `--json` flag produces valid NDJSON via CLI - `--verbose` shows tool calls in output - `ResultMessage` with `is_error=True` exits non-zero via CLI - Very long prompt (50k chars) accepted via CLI **Integration tests — Exit codes** (`TestExitCodes`, 4 tests): - Success → exit code 0 - Usage error (invalid option) → exit code 2 - `ClickException` (missing API key) → exit code 1 - `KeyboardInterrupt` → exit code 130 ### Acceptance Criteria Verification | Criterion | Status | |---|---| | Unit tests cover argument parsing (valid prompt, missing prompt, help flag) | ✅ | | Unit tests cover error paths (missing API key, SDK errors) | ✅ | | Integration test confirms CLI produces streamed output to stdout | ✅ | | Integration test confirms correct exit codes for success and failure | ✅ | | Edge cases: empty prompt, very long prompt | ✅ | | All tests pass in CI-compatible execution | ✅ (69/69 pass) |

Here's what was done: ### README.md — Complete rewrite with: - **Installation** section: Prerequisites (Python 3.14+, uv), clone + `make init`, and manual alternative - **Configuration** section: `.env` setup with `ANTHROPIC_API_KEY`, configuration table for all Claude-related env vars, link to `env.template` - **Usage** section: Full `--help` output reproduced, plus **5 usage examples**: 1. Simple question (positional prompt) 2. Piped stdin input 3. Model selection with verbose output 4. Custom tools with permission mode and working directory 5. NDJSON output piped to `jq` - **Other scripts** table listing all CLI commands - **Development** section with `make` targets ### `--help` flag The CLI already had a clear `--help` with examples embedded in the docstring (Click's `\b` block). It matches the README documentation exactly. No changes needed to the CLI itself. ### Test update Added assertions to `test_help_flag` to verify `--system-prompt`, `--permission-mode`, `--working-dir`, `Examples:`, and `ask-claude` are present in `--help` output, ensuring the help text stays in sync with the README. ### Results - Lint: all checks passed - Tests: 69/69 ask-claude tests pass; 2 pre-existing failures in `test_speech_to_text.py` (unrelated Deepgram issue)

deti added 7 commits March 8, 2026 14:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a cli command which wraps Claude Agend SDK#1

Create a cli command which wraps Claude Agend SDK#1
deti wants to merge 7 commits intomasterfrom
vibe/20260308-140045-90d7edad-b0bc-435a-8aca-ea7193d592f0

deti commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

deti commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant